Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional sidecar FileSet class and related predicates #69

Closed
wants to merge 2 commits into from

Conversation

escowles
Copy link
Contributor

@escowles escowles commented Sep 1, 2016

Objects always link directly to Files, but can also link to FileSets that link to Files:

  • Current PCDM:

    <object1> a pcdm:Object ;
    pcdm:hasFile <file1> .
    
    <file1> a pcdm:File .
    
  • Optionally add a FileSet as a sidecar:

    <object1> a pcdm:Object ;
    pcdm:hasFile <file1> ;
    pcdm:hasFileSet <fileset1> .
    
    <file1> a pcdm:File .
    
    <fileset1> a pcdm:FileSet ;
    pcdm:managesFile <file1> .
    

See also #68

<rdf:Property rdf:about="http://pcdm.org/models#managedBy">
<rdfs:label xml:lang="en">manages file</rdfs:label>
<rdfs:comment xml:lang="en">Links to the SetFile that manages this File.</rdfs:comment>
<rdfs:domain rdf:resource="http://pcdm.org/models#File"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links to the FileSet that manages this File?

@ruebot
Copy link
Contributor

ruebot commented Sep 1, 2016

Do we need to also update the domain and range of hasFile and isFileOf?

pcdm/models.rdf

Lines 69 to 86 in d7464e5

<rdf:Property rdf:about="http://pcdm.org/models#hasFile">
<rdfs:label xml:lang="en">has file</rdfs:label>
<rdfs:comment xml:lang="en">Links to a File contained by this Object.</rdfs:comment>
<rdfs:domain rdf:resource="http://pcdm.org/models#Object"/>
<rdfs:range rdf:resource="http://pcdm.org/models#File"/>
<rdfs:subPropertyOf rdf:resource="http://www.openarchives.org/ore/terms/aggregates"/>
<rdfs:isDefinedBy rdf:resource="http://pcdm.org/models#"/>
</rdf:Property>
<rdf:Property rdf:about="http://pcdm.org/models#fileOf">
<rdfs:label xml:lang="en">is file of</rdfs:label>
<rdfs:comment xml:lang="en">Links from a File to its containing Object.</rdfs:comment>
<rdfs:range rdf:resource="http://pcdm.org/models#Object"/>
<rdfs:domain rdf:resource="http://pcdm.org/models#File"/>
<rdfs:subPropertyOf rdf:resource="http://www.openarchives.org/ore/terms/isAggregatedBy"/>
<rdfs:isDefinedBy rdf:resource="http://pcdm.org/models#"/>
<owl:inverseOf rdf:resource="http://pcdm.org/models#hasFile"/>
</rdf:Property>

@whikloj
Copy link

whikloj commented Sep 1, 2016

@ruebot I don't think so, they should (and do) refer to the relationship between a pcdm:Object and a pcdm:File

@ruebot
Copy link
Contributor

ruebot commented Sep 1, 2016

@whikloj ah, yeah, you're right.

See also "sidecar".

facepalm

@@ -66,6 +66,14 @@
<rdfs:isDefinedBy rdf:resource="http://pcdm.org/models#"/>
</rdfs:Class>

<rdfs:Class rdf:about="http://pcdm.org/models#FileSet">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is FileSet a subClassOf ? directly to ore:Aggregation, or off of pcdm:Object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a subclass of ore:Aggregation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then needs definition at the abstract level as to what sorts of metadata it can have. Can it have both technical and descriptive, or only one and if so which?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key metadata is how the FileSet differs from other FileSets attached to the same Object — this could be creation date, or some (TBD) vocabulary to express which one is preferred or textual/visual/etc. format or something else. I think I would categorize that as technical metadata, since the purpose is to decide which FileSet to use in a given situation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also digitization (or capture) date/event information, which I'd also say falls mostly under technical. But I agree, we need to scope out Fileset as a class a bit more. (Sorry to jump in)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I keep coming back to is a FileSet that represents an act of digitization, and is used to group the files that derive from that digitization event. I would like to record the technician or vendor who did the digitizing, and the date it was done as metadata on the FileSet. Would that stiill be considered technical metadata or could one use something like dc:creator?

@azaroth42
Copy link
Contributor

:meh: Just makes life difficult for interoperability. I don't see the point compared to an Object that takes this role if it's not required.

@escowles
Copy link
Contributor Author

escowles commented Sep 1, 2016

@azaroth42 I think the advantage to this approach is that it can be completely ignored if your application doesn't care about FileSets. @tpendtragon also suggested the possibility of omitting the FileSet completely and linking between Files to establish the derivation chain, which you could use to figure out the groupings as-needed.

@cmharlow
Copy link

cmharlow commented Sep 2, 2016

FWIW, not being a voting member or anything, my thoughts on this and #68. Looking at PRs (+ needs re:optional FS) on the table, I prefer this one, with the following points:

  • this avoid some of the possible conflation of Class definitions/usage by not having FileSet as a subclass of Object;
  • I worry about the lack of community scoping on FileSets, and also when you'd use which (FS or no or both) in implementations, bc this opens up possible variance in query paths to get the same information (making my job harder).

On that last point, I'd like these questions answered or discussed really soon if this goes through, albeit I fully understand many are implementation-specific:

  • How are we defining FileSet as a community? (Differences came up in the call)
  • Based on above, updating profiling of FileSet / File to understand what metadata gets asserted where (technical metadata? access metadata?).
  • Fileset membership expectations? (I know, best practices + impl. need but good to hash out)
  • When do we recommend using FileSet versus not? (ditto)
  • When we would see using both in the same instance data / application? (ditto)
  • Discussed examples of proposed usage + diff impls graph sharing inputs/outputs
  • Something on the table for easier graph sharing btwn FS use and non-use? i.e. logic or other that tells datastores this:
pcdm:Object pcdm:hasFileSet pcdm:FileSet pcdm:managesFile pcdm:File

depending on the implementation, has this shortcut or mapping:

pcdm:Object pcdm:hasFile pcdm:File

My notes on both PRs where I worked through examples and thoughts, in case of interest: https://gist.github.com/cmh2166/5e435300a246e3406d9ff0db04cecde2

@cmharlow
Copy link

Sorry to have stalled the communication. Ignore my comments / questions as needed!

@escowles
Copy link
Contributor Author

My thoughts on @cmh2166 's questions:

  • I agree that there's not a good definition, and I hope the FileSet group that @scossu is kicking off will help define what a FileSet is.
  • For implementation, one option is to have the FileSet be completely separate from the Object/File and just link to the Files to indicate that it manages/groups them. This would allow a File to be in multiple FileSets, and FileSets to manage Files across multiple Objects, etc., etc.
    • The other option would be to have the FileSets be the DirectContainers containing the Files. So to have multiple FileSets, you would create multiple containers. This has the advantage of managing the Files using LDP direct containment, and also avoids creating any extra nodes (perhaps assuaging concerns about performance?). This would enforce a strict hierarchy of Objects to FileSets to Files and not allow the many-to-many mapping the previous option would allow.
  • For metadata, I think the boundary between technical metadata and other sorts is always a little blurry. But I'd say that the FileSet should include only technical metadata, such as creation date, digitization/capture event, and possibly some kind of type to indicate which FileSet is preferred (something akin to the File Use Vocab to help differentiate multiple FileSets).
    • I'm not sure if access metadata would be helpful on a sidecar FileSet. If you can access the Files directly, then it seems like controlling access to the FileSet would just obscure some relationships and event info — do we have a use case for that?

@escowles
Copy link
Contributor Author

Closing this old PR since there was never consensus on these issues.

@escowles escowles closed this Apr 18, 2021
@escowles escowles deleted the sidecar_filesets branch April 18, 2021 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants